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Abstract. Manifold learning has been successfully applied to a vari- 
ety of medical imaging problems. Its use in real-time applications re- 
quires fast projection onto the low-dimensional space. To this end, out-of- 
sample extensions are applied by constructing an interpolation function 
that maps from the input space to the low-dimensional manifold. Com- 
monly used approaches such as the Nystrom extension and kernel ridge 
regression require using all training points. We propose an interpolation 
function that only depends on a small subset of the input training data. 
Consequently, in the testing phase each new point only needs to be com- 
pared against a small number of input training data in order to project 
the point onto the low-dimensional space. We interpret our method as 
an out-of-sample extension that approximates kernel ridge regression. 
Our method involves solving a simple convex optimization problem and 
has the attractive property of guaranteeing an upper bound on the ap- 
proximation error, which is crucial for medical applications. Tuning this 
error bound controls the sparsity of the resulting interpolation function. 
We illustrate our method in two clinical applications that require fast 
mapping of input images onto a low-dimensional space. 

1 Introduction 

Manifold learning maps high-dimensional data to a low-dimensional manifold 
and has recently been successfully applied to a variety of applications. Specifi- 
cally in medical imaging, manifold learning has been used in segmentation [24], 
registration |12|15) . computational anatomy pj], classification |6|22j . detection 
[2"0] , and respiratory gating |10I23| . But to the best of our knowledge, little work 
has been done using manifold learning for medical imaging applications that 
require fast projections onto a low-dimensional space. 

In this paper, we demonstrate a method that achieves fast projection of input 
data onto a low-dimensional manifold by constructing a projection function that 
only depends on a small subset of the training data. Our method is a sparse 
variant of kernel ridge regression [18] and can be interpreted as an interpola- 
tion function optimized to only use a few of the training data. Furthermore, 
the construction of the interpolation function guarantees an upper bound on an 
interpolation error for training data. The error is measured in terms of the aver- 
age squared Euclidean distance between the predicted points of the interpolator 
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versus those of kernel ridge regression using all the points. As our interpolator 
has no parametric model for the data points, its complexity is driven by the 
complexity of the training data and the bound on the approximation error. 

Related work on out-of-sample extensions. Manifold learning is a spe- 
cific case of nonlinear dimensionality reduction and refers to a host of different 
algorithms |13j . In medical image analysis, manifold learning is used to construct 
a low-dimensional space for images in which subsequent statistical analysis (re- 
gression, classification, etc.) is performed. Many manifold learning techniques do 
not construct a mapping of the entire input space but only of the training points. 
For these methods, estimating a new point's location in the low-dimensional 
space is performed via an out-of-sample extension [5], with Nystrom extensions 
commonly used. For certain manifold learning methods, a Nystrom extension is 
a special case of kernel ridge regression [TH] , and for both the Nystrom extension 
and kernel ridge regression, the resulting interpolation function for mapping a 
new input point to the low-dimensional space depends on all training data. Thus, 
we need to compare a new point to all training data points, which is computa- 
tionally expensive for volumetric images, especially if the number of input data 
used to learn the manifold is large. 

Our work is most similar to reduced rank kernel ridge regression [7], which 
also approximates kernel ridge regression by only using a small number of input 
training points. Reduced rank kernel ridge regression greedily selects training 
points to minimize a particular cost function. Specifically, the algorithm incre- 
mentally adds a training point that causes the largest decrease in overall cost. 
Different criteria could be used for when the greedy procedure is terminated such 
as if a pre-specified desired number of training points to use is reached or if the 
overall cost drops below a pre-specified desired error tolerance. Importantly, for 
medical applications, the latter criterion is more directly connected to the error 
analysis of the whole processing pipeline. Our approach also requires the user 
to specify a desired error tolerance but uses a different cost function. Rather 
than using a greedy approach to select which training points to add, we solve 
a convex optimization problem implied by our cost function. We remark that 
the proposed cost function also differs from that of support vector regression [5] , 
which essentially achieves sparsity via excluding training points that map suf- 
ficiently close to the estimated function. Our cost is more lenient, asking that 
an average error be small rather than asking that an error be small for each 
individual training point. 

Contributions. For high-dimensional input points xi,X2, ■ ■ ■ ,x n G K d and 
their low-dimensional representations y\ , j/2> • • • , Vn & K p as computed by any 
manifold learning algorithm, we propose a convex program for constructing an 
out-of-sample extension that guarantees a bound on the approximation error. 
Formally, if / : R d — > W is the out-of-sample extension function estimated 
via kernel ridge regression, then the sparse projection function / : Mr — ► M. p 
constructed by our algorithm satisfies 
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where || • H2 denotes the Euclidean norm, e > is a pre-specified error tolerance, 
and / depends only on a small subset of X\, . . . , x n . The size of the subset, i.e., 
the sparsity of the resulting function /, depends on tolerance e and training pairs 
(xi, j/i), . . . , {x ni y n ). Finding the smallest such subset is NP-hard. We instead 
consider a convex relaxation with sparsity induced by a mixed £i/(.2 norm. While 
the proposed sparse approximation to kernel ridge regression can be used more 
generally for other multivariate regression tasks, we restrict our focus in this 
paper to out-of-sample extensions for manifold learning. 

We apply our method to two medical imaging applications that require a fast 
projection onto a low-dimensional space. The first application is respiratory gat- 
ing in ultrasound, where we assign the breathing state to each ultrasound frame 
during the acquisition in real-time. The second application is the estimation of 
a patient's position in a magnetic resonance imaging (MRI) scanner while the 
patient is being moved to a target location. 



2 Background 

Our method builds heavily on kernel ridge regression [T5], reviewed below. We 
also briefly discuss the result that a Nystrom extension is a special case of ker- 
nel ridge regression under certain conditions [19 . As a consequence, our sparse 
approximation to kernel ridge regression also contains a sparse approximation 
to the widely used Nystrom extension. 

Kernel ridge regression. Let HI be a family of functions mapping ]R d to R 
such that HI is a reproducing kernel Hilbert space (RKHS) [1 with kernel function 
K : R d x R d — >• E. Given points xi,...,x n £ M d and y\ , . . . , y n £ R p , we assume 
that there exists a function /* = (/^, . . . , /*) g HP such that for each i, we have 
Hi — f *{ x i) + w i f° r some noise term Wi GW. Kernel ridge regression seeks an 
estimate / of function /* by solving 

p f n ~| 
/= argmin ]T - /,(x 2 )) 2 + A||/ 3 -||| , (2) 

(h,.,f P )EW j = 1 J 

where matrix Y £ E nxp contains data point yi as its z-th row, constant A > 
controls the amount of regularization, and || • ||n is the norm induced by the inner 
product of HI. The solution of optimization problem |2]) is 

n 

/(■)=X;x(.,a; i )a i , (3) 

i=l 

where 5^ refers to the i-th row of ro-by-p matrix 

a= (K + XI nxn y 1 Y, (4) 

matrix K G M™ x " is given by Kij = K(a;i, xj), and I n xn is the n-by-n identity 
matrix f!8l. 
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Nystrom extension. The Nystrom method approximates a certain type of 
eigenfunction problem and is used for out-of-sample extensions in manifold learn- 
ing [5] . For manifold learning algorithms that assign the low-dimensional coordi- 
nates directly from the eigenvectors of K, e.g., Isomap |2T], locally linear embed- 
dings [T7] , and Laplacian eigenmaps [3j , we can derive the Nystrom extension as 
a special case of kernel ridge regression with A = 0. Specifically, with eigendecom- 
position K = (PA^ 1 , where A = diag(Ai, A2, . . . , A n ) and Ai > A 2 > • • • > A„, 
we consider when the low-dimensional embedding is given by Y = $1, the matrix 
consisting of the first I columns of If we use <f>^ to denote the j-th column 
of <P, then with A = and Y = <£>£, eq. Q reduces to 

a = K~ X Y = <M _1 # _1 ^ = &A' 1 

(5) 

Letting <pf^ refer to the i-th element of (j>^ , and substituting eq. ^ into eq. ([3]), 
we see that, for a new point x € K d , the j-th element of f(x) is given by 

n n / 1 \ 1 n 

i=l i=l \ 3 ' 3 i=l 

which is the formula for the low-dimensional embedding of x using the Nystrom 
extension [5j . Importantly, kernel function K depends on the choice of a manifold 
learning algorithm [5]. The above relationship shows that for certain manifold 
learning algorithms, kernel ridge regression is a richer model for out-of-sample 
extensions than the Nystrom extension. 
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3 Sparse Approximation to Kernel Ridge Regression 

We now present our method. We seek an interpolation function / : R d — > K p 
within a family of functions G = {/(•) = Y^i=i ^("> x i) a i '■ a S E nxp }, with 
many vectors on € W equal to zero while ensuring that upper bound ([!]) holds. 
In particular, we formulate a convex optimization problem where a £ M. nxp is 
the only decision variable; solving this problem yields a that implies a sparse 
approximation / to the kernel ridge regression solution /. 

Because we optimize over functions in G, upper bound (JlJ can be simplified 
by noting that Y17=l ~ /(^Olli = \\Ka — Ka\\^, where || • \\p denotes the 

Frobenius norm, and a is given by eq. In fact, f(xi) and f(xi) are given by 
the i-th rows of Ka and Ka, respectively. Thus, bound ([T]) can be rewritten as 
\\Ka — Ka\\ 2 F < ne 2 . Satisfying this constraint while encouraging the number of 
nonzero vectors ctj to be small can be achieved by solving the following convex 
optimization problem: 

n 

a = argmin \\0nW2 s.t. \\Ka — Ka\\ 2 F < ne 2 . (7) 
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By minimizing the mixed norm of a, we encourage each vector on to either 
consist of all zeros or all non-zero entries [2] . Note that if p = 1 and we instead 
ask for the sparsest solution possible, then the objective function becomes the £o 
norm (i.e., the number of nonzero elements) of a, and the optimization problem 
itself becomes NP-hard [2] . 

To solve optimization problem ^ , we reduce it to solving many instances of 
its unconstrained Lagrangian form for which there is already a fast solver. Specif- 
ically, by Lagrangian duality and convexity, solving optimization problem ^ is 
equivalent to solving the dual problem 

max^min^ |^ ||a,-|| a + £(\\K$ - Ka\\% - ne 2 ) j = sup£[ 5 (l/0 ~ne 2 } , (8) 

where £ is a Lagrange multiplier, and 

5(7)= min \\\Ka-Kaf F +7$3ll«i||2 \- (9) 

For a fixed £, we can compute 3(1/0 efficiently using the fast iterative shrinkage- 
thresholding algorithm (FISTA) [3]. Moreover, from a standard result of La- 
grangian duality, dual problem ^ maximizes a concave function, which in this 
case is only over scalar variable £. Thus, we can efficiently solve the right hand 
side of ^ by making as many calls to FISTA as needed to achieve the desired 
accuracy in estimating £. Given the final estimated value £ of £, we recover 
solution a by seeking a € W ixp that yields <?(l/£) in eq. Q. 

Once the coefficient matrix a is obtained, the interpolation function / is 
uniquely defined: 

n 

f(-) = ^K(;x i )a i . (10) 
f=i 

The number of nonzero 5^ € K p vectors depends on error tolerance e, regular- 
ization parameter A, the kernel function K, and the data itself. We refer to the 
data points Xi corresponding to nonzero on as support vectors. As we observe 
empirically in the next section, decreasing parameters e and A each generally 
produce more support vectors used in projection. This is not surprising: increas- 
ing e increases the size of the feasible set in optimization problem Q, allowing 
for potentially more candidate solutions a. Meanwhile, as A — > 00, the coeffi- 
cient matrix a for kernel ridge regression, defined in eq. Q, approaches a — jY, 
which goes to for large A. As a result, a also gets pushed to 0. 

We can choose the similarity kernel K to match the specific choice of manifold 
learning algorithm used to embed the training data. This allows us to provide 
a sparse approximation to the Nystrom extension as discussed in Section [2] Al- 
ternatively, our method is applicable to any kernel K, regardless of the manifold 
learning algorithm used for training. 

Lastly, we note that solving the convex program (|8| to obtain a incurs an of- 



fline, one-time cost. During testing, we use the resulting sparse interpolator (10 1 
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whose computational cost is directly proportional to the number of support 
vectors. Our interpolator will always be at least as fast to compute as that of 
kernel ridge regression that uses all the training points as support vectors and 
corresponds to the special case of our interpolator where e = 0. 



4 Results 

We apply our sparse interpolator to synthetic data (a Swiss roll), respiratory 
gating in ultrasound, and MRI classification. We report the number of support 
vectors as a proxy for computational speed since wall-clock time is directly pro- 
portional to the number of support vectors. Furthermore, the datasets we use 
are still relatively small for the scenarios our method intends to address, making 
wall-clock time for the experiments we run not reflective of real use. However, our 
empirical results suggest that our method can work with larger datasets since 
the number of support vectors scales not with the size of the training dataset but 
instead with the complexity of the training data's low-dimensional embedding. 

For synthetic data, we use Hessian eigenmaps [5] for manifold learning, which, 
to the best of our knowledge, does not have a known Nystrom extension. For 
the two experiments on real data, we use Laplacian eigenmaps [¥ for manifold 
learning and construct our sparse interpolator using the same kernel function as 
the one used for Laplacian eigenmap's Nystrom extension 

k( X , x ')= , EMI (id 
/£r = i^(^)£" = iwv,^) 



where W : R d x R d ->• R + is a heat kernel given by W(x,x') = e -\\*-x'\\l/t if 
11^ — x '\\2 < t and otherwise, for some pre-specified temperature t and nearest- 
neighbor threshold r — both parameters chosen based on the application of 
interest. We can also find the k nearest neighbors rather than defining nearest 



neighbors to be within a ball of radius r. With kernel function (11), construct- 
ing our sparse interpolator with A — and e = yields Laplacian eigenmap's 
Nystrom extension that uses all the training points. We do not use the same 
manifold learning algorithm for all datasets; the choice of manifold learning al- 
gorithm depends on the dataset and the application of interest. 



4.1 Synthetic Data 



We apply our method to a Swiss roll with n = 1000 points, shown in Fig. I pt) 



First, we compute low-dimensional representations yi,...,y n € K using Hes- 
sian eigenmaps [5] with a 7-nearest-neighbor graph. We construct our sparse 
interpolator using kernel function K(a;, x') — exp(— \\x — x'\\\/a 2 ). To probe the 
behavior of our interpolator, we vary kernel ridge regression parameter A, kernel 
width er, and error tolerance e. Fig. [l] reports the resulting number of support 
vectors and illustrates results for one setting of the parameters. 
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Fig. 1: Results for a Swiss roll with n — 1000 points: (a) the original 3D data 
points; (b) their 2D embedding; (c) the number of support vectors as a function 
of error tolerance e for various A and a. For the remaining panels (d)-(f), we fix 
A = 0.1, a = 4, and e — 0.003: (d) the 161 support vectors found; (e) our approx- 
imated 2D embedding of support vectors; (f) a comparison of 2D embeddings 
from our method and kernel ridge regression (lines show correspondences). 



We observe that the support vectors are not uniformly sampled in the input 
space nor on the learned 2D manifold. Instead, they appear along the bound- 
aries or form a skeleton within the learned manifold. We also observe in Fig. |l|[f)| 
that the largest discrepancies in the predicted point locations between our sparse 
interpolator and kernel ridge regression occur along the boundaries. Unsurpris- 
ingly, increasing kernel width a reduces the number of support vectors needed 
to achieve the same error tolerance e as each support vector has broader spatial 
influence in the input space. Furthermore, increasing kernel ridge regression reg- 
ularization parameter A also reduces the number of support vectors, as discussed 
in Section [3l 

By repeating this experiment using a Swiss roll with n — 2000, n = 3000, 
and n — 4000 points, we empirically find that for a variety of parameter settings 
A, a, and e, the number of support vectors remains roughly constant as n grows 
large. For example, with A = 0.1, a = 4, and e = 0.003, we obtain 161, 174, 163, 
and 170 support vectors for n = 1000,2000,3000,4000 points respectively. This 
suggests that the number of support vectors to depend on the low-dimensional 
embedding's complexity and not on the dataset size n. 
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Fig. 2: Ultrasound gating. Top: Ultrasound images of the liver over time (ab- 
domen, right upper quadrant). Bottom left: Correlation coefficient vs. error tol- 
erance e. Bottom right: The number of support vectors vs. error tolerance s. Both 
figures in the bottom report results for different values of kernel ridge regression 
regularization parameter A. 



4.2 Respiratory Gating of Ultrasound Images 

Respiratory gating tracks a patient's breathing cycle, which has numerous ap- 
plications such as 4D imaging, radiation therapy, and image mosaicing [T5] . 
Manifold learning has been used for highly accurate respiratory gating of ultra- 
sound images |23j , where 4D data reconstruction was achieved with retrospective 
gating, i.e., the gating was calculated after the data acquisition was finished. We 
extend this work to attain real-time gating. A small number of breathing cycles 
are acquired and used as input for manifold learning to construct the respira- 
tory signal, as is done for retrospective gating. The new incoming stream of 
ultrasound images is then gated by performing an out-of-sample extension. 

We conduct experiments on five 2D ultrasound image sequences of the human 
liver acquired during free breathing; example images are shown in Fig. [2j Each 
sequence contains 640 x 480-pixel images and vary in length between 298 and 
371 frames captured at 33 Hz. For a given image sequence, we use each image in 
the sequence as an input data point for learning a f D manifold with Laplacian 
eigenmaps [4] ; we use a 9-nearest- neighbor graph with an associated heat kernel 
of temperature t — 10. The ID embedding learned using an entire sequence 
of images serves as a reference signal for evaluating our sparse out-of-sample 
extension versus kernel ridge regression as the baseline. In what follows, we 
compare the 1 D embedding of our sparse out-of-sample extension to the reference 
signal by computing a correlation coefficient between them. We use kernel ridge 



Sparse Projections of Medical Images onto Manifolds 



9 



Data 


# Frames 


Learning on first 200 frames 
CC (KRR) CC (sparse) # SV's 


Learning on entire data 
CC (KRR) CC (sparse) # SV's 


Seq. 1 


354 


96.5% 


96.4% 


79 


99.9% 


96.9% 73 


Seq. 2 


335 


97.7% 


97.5% 


99 


99.9% 


98.6% 100 


Seq. 3 


298 


98.3% 


97.8% 


51 


99.3% 


98.9% 61 


Seq. 4 


371 


99.7% 


99.4% 


53 


99.6% 


99.7% 45 


Seq. 5 


298 


99.0% 


98.7% 


41 


99.9% 


99.5% 50 



Table 1: Results for respiratory gating on ultrasound images. For each image 
sequence, we show the number of frames it contains, the correlation coeffi- 
cient (CC) for kernel ridge regression (KRR) and our sparse interpolator, and 
the number of support vectors (SV's). Parameter values: A = 0.1, e = 0.001. 



regression as a baseline method. Here we train on the first 200 frames and test 
on the remaining frames. We then compare the results with those obtained by 
training on all frames, as would be done for retrospective gating. 

We first examine the influence of parameters e and A on the resulting interpo- 
lator. Training on the first 200 images of one of the ultrasound image sequences, 
we compute the correlation coefficient with the reference signal and the number 
of support vectors versus the error tolerance e (Fig. [2|. As expected, smaller 
error tolerance e requires more support vectors but also leads to a higher corre- 
lation coefficient with respect to the reference signal. Also, a higher kernel ridge 
regression regularization parameter A leads to fewer support vectors. However, 
stronger regularization also leads to lower correlation coefficients. These results 
suggest a natural tradeoff between the accuracy and the computational cost of 
the projection operation. 

In the next experiment, we use A = 0.1 and e = 0.001. Training on the 
first 200 frames and testing on the rest of the frames, we report the correla- 
tion coefficients and the number of support vectors in Table [l] The number of 
support vectors for kernel ridge regression is 200 in this case. We then repeat 
the experiment, training on all the frames. In this case, the number of support 
vectors for kernel ridge regression is the length of the sequence. We achieve a 
high correlation for all sequences, with a comparable performance between our 
sparse interpolator and kernel ridge regression. Comparing the number of sup- 
port vectors when training on the first 200 frames vs. training on all the frames, 
we note that the number of support vectors stays roughly the same for a given 
image sequence. This again suggests that the number of support vectors depends 
on the low-dimensional embedding's complexity and not the training set size. 

4.3 Patient Position Estimation Using MRI 

The radio frequency power in magnetic resonance imaging leads to tissue heating 
and has to be monitored by measuring the specific absorption rate, which de- 
pends on the position of the patient in the scanner. For current high-resolution 
scanners, this imposes restrictions because either fewer slices can be acquired 
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Fig. 3: Left: Coronal plane of MM scan showing the entire patient. Right: Axial 
slices on which manifold learning is performed. 




6 X 1 4 Classification rale E x 1 4 

(a) (b) (c) 

Fig. 4: Leave-one-out classification results for MRI data: (a) classification rate 
vs. e for our sparse out-of-sample extension (solid line) and kernel ridge regression 
(dotted line) (b) the number of support vectors vs. classification rate; (c) the 
number of support vectors vs. error tolerance e. All figures report results for 
different values of kernel ridge regression regularization parameter A. 



or the in-plane resolution has to be reduced. Manifold learning can be used to 
estimate the position of the patient in the scanner [22] • 

First, low-resolution images are acquired while the bed that the patient lies 
on moves inside the scanner. The images are embedded in a low-dimensional 
space, where each axial image is associated with a body part (head, neck, lung, 
etc.) using a nearest-neighbor classifier. By knowing which slices correspond to 
which body parts, we can estimate the position of the patient in the scanner. 
It is important that the estimation be done in real-time to provide the position 
information before the high-resolution scan starts. In this application, we can 
apply manifold learning offline on a large database of scans. Then during the 
actual scan, we use an out-of-sample extension to project the acquired slices 
into the low-dimensional space. For large training datasets, it may be difficult 
to meet the time requirements with kernel ridge regression. Consequently, the 
reduction to a small set of support vectors offers a substantial advantage. 

We run experiments on 13 whole body scans, such as the example shown 
in Fig. [3] A medical expert assigned an anatomical label (head, neck, lung, 
abdomen, upper leg, and lower leg) to each of the axial slices (64x64 pixels). 
We apply Laplacian eigenmaps to embed the high dimensional slices in a two- 
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dimensional space; we use a 40-nearest-neighbor graph with a heat kernel of 
temperature t — 49. To predict the anatomical label of an axial image, we 
perform nearest-neighbor classification in the learned low-dimensional space. We 
repeat this classification procedure for different values of error tolerance e ranging 
from 1 x 1(T 4 to 5 x 1CT 4 . 

We compare the classification performance of embeddings obtained from our 
sparse interpolator and kernel ridge regression. Fig. ^(a) reports leave-one-out 



classification performance for different values of error tolerance e. The classi- 
fication rates for kernel ridge regression are provided for comparison; they do 
not change for different values of e. Figs, ^[bjj and ^c) characterize the sparsity 
of the interpolation function constructed by reporting the number of support 
vectors as a function of the classification rate and error tolerance e. The total 
number of frames used in this experiment is 2697, which corresponds to the num- 
ber of support vectors for kernel ridge regression. We observe a clear correlation 
between error tolerance e and the classification performance. Smaller values of e 
lead to better classification performance but require more support vectors. Thus, 
we can trade off computational speed with classification performance by tuning 
parameters A and s to be as large as possible while maintaining a classification 
rate above a minimum tolerated threshold. 



5 Conclusion 

We derived a novel method for multivariate regression that approximates kernel 
ridge regression, where the final estimated interpolation function depends only 
on a subset of the original input points acting as support vectors. Our approach 
provides a guarantee on the approximation error for training data. We applied 
our method as an out-of-sample extension for manifold learning, illustrating 
applications to respiratory gating and MRI classification. 

Turning toward nonlinear dimensionality reduction more generally, many 
widely used algorithms are computationally expensive for massive datasets. Thus, 
ideally we would like to find support vectors first, before applying dimension- 
ality reduction. Our results suggest that the support vectors for interpolation 
may not be uniformly sampled in the input space. This invites the question 
of how to non-uniformly sample training data in the input space and adjust a 
dimensionality reduction algorithm accordingly to account for the geometry of 
these samples. 
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